Combinatorial Motif Analysis in Yeast Gene Promoters: the Benefits of a Biological Consideration of Motifs
نویسندگان
چکیده
Combinatorial Motif Analysis in Yeast Gene Promoters: The Benefits of a Biological Consideration of Motifs. (December 2004) Kevin L. Childs, B.S., University of Michigan; Ph.D., Texas A&M University Chair of Advisory Committee: Dr. Thomas R. Ioerger There are three main categories of algorithms for identifying small transcription regulatory sequences in the promoters of genes, phylogenetic comparison, expectation maximization and combinatorial. For convenience, the combinatorial methods typically define motifs in terms of a canonical sequence and a set of sequences that have a small number of differences compared to the canonical sequence. Such motifs are referred to as (l, d)-motifs where l is the length of the motif and d indicates how many mismatches are allowed between an instance of the motif and the canonical motif sequence. There are limits to the complexity of the patterns of motifs that can be found by combinatorial methods. For some values of l and d, there will exist many sets of random words in a cluster of gene promoters that appear to form an (l, d)-motif. For these motifs, it will be impossible to distinguish biological motifs from randomly generated motifs. A better formalization of motifs is the (l, f, d)-motif that is derived from a biological consideration of motifs. The motivation for (l, f, d)-motifs comes from an examination of known transcription factor binding sites where typically a few positions in the motif are invariant. It is shown that there exist (l, f, d)-motifs that can be found in the promoters of gene
منابع مشابه
Exploring Gene Signatures in Different Molecular Subtypes of Gastric Cancer (MSS/ TP53+, MSS/TP53-): A Network-based and Machine Learning Approach
Gastric cancer (GC) is one of the leading causes of cancer mortality, worldwide. Molecular understanding of GC’s different subtypes is still dismal and it is necessary to develop new subtype-specific diagnostic and therapeutic approaches. Therefore developing comprehensive research in this area is demanding to have a deeper insight into molecular processes, underlying these subtypes. In this st...
متن کاملMolecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds
The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...
متن کاملIdentification of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microa...
متن کاملFunctional motifs in Escherichia coli NC101
Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...
متن کاملIntegrative Method for Identifying Combinatorial Regulation of Transcription Factors
To identify combinatorial regulation of transcription factors (TFs) and their binding motifs is important for understanding gene expression. However, the customary approach [2, 3] in computational microarray analysis is to cluster gene expression patterns and to identify individual sequence motifs specific to each gene cluster. The limitations of this approach are: 1) it does not directly addre...
متن کامل